8 Tutorial 6: Variance of the OLS estimator and hypothesis testing

8.1 Block 1: Motivation — Why Isn’t Unbiasedness Enough? (3 min)

We work with the model $y = \beta_0 + \beta_1 x + u$ under the following assumptions. You need to know these by name — each one will be explicitly cited in the derivations below.

Label	Name	Statement
SLR.1	Linear in Parameters	$y = \beta_0 + \beta_1 x + u$
SLR.2	Random Sampling	$\{(x_i, y_i)\}_{i=1}^n$ are i.i.d. draws
SLR.3	Sample Variation in $x$	$\text{SST}_x = \sum (x_i - \bar{x})^2 > 0$
SLR.4	Zero Conditional Mean	$E[u \mid x] = 0$
SLR.5	Homoskedasticity	$\text{Var}(u \mid x) = \sigma^2$ (constant)
SLR.6	Normality	$u \mid x \sim N(0, \sigma^2)$

What we proved so far: Under SLR.1–SLR.4, the OLS estimator is unbiased: $E[\hat{\beta}_1] = \beta_1$. But unbiasedness says only that $\hat{\beta}_1$ is centred at $\beta_1$ on average across all possible samples. It says nothing about how far any single estimate might be from the truth.

What we need now: To know how precise $\hat{\beta}_1$ is, we need its variance. Adding SLR.5 allows us to derive $\text{Var}(\hat{\beta}_1)$. Adding SLR.6 on top gives us the exact sampling distribution, which enables hypothesis tests and confidence intervals.

Under SLR.1–SLR.5:

\[\boxed{\text{Var}(\hat{\beta}_1 \mid \mathbf{x}) = \frac{\sigma^2}{\text{SST}_x}} \qquad\text{where}\quad \text{SST}_x = \sum_{i=1}^{n}(x_i - \bar{x})^2\]

The proof relies on a key representation of $\hat{\beta}_1$ that isolates the source of randomness. We derive it step by step.

Step 1: Start from the OLS formula. \[\hat{\beta}_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\text{SST}_x}\]

Step 2: Substitute the model (SLR.1). Since $y_i = \beta_0 + \beta_1 x_i + u_i$, taking the sample mean gives $\bar{y} = \beta_0 + \beta_1 \bar{x} + \bar{u}$. Subtracting: \[y_i - \bar{y} = \beta_1(x_i - \bar{x}) + (u_i - \bar{u})\]

Step 3: Expand the numerator. \[\sum(x_i - \bar{x})(y_i - \bar{y}) = \sum(x_i - \bar{x})\bigl[\beta_1(x_i - \bar{x}) + (u_i - \bar{u})\bigr] = \beta_1 \underbrace{\sum(x_i - \bar{x})^2}_{=\,\text{SST}_x} + \sum(x_i - \bar{x})(u_i - \bar{u})\]

Step 4: Eliminate $\bar{u}$ from the second term. \[\sum(x_i - \bar{x})(u_i - \bar{u}) = \sum(x_i - \bar{x})\,u_i - \bar{u}\underbrace{\sum(x_i - \bar{x})}_{=\,0} = \sum(x_i - \bar{x})\,u_i\]

The key fact is $\sum(x_i - \bar{x}) = 0$ (deviations from the mean always sum to zero).

Step 5: Divide by $\text{SST}_x$. \[\hat{\beta}_1 = \frac{\beta_1 \cdot \text{SST}_x + \sum(x_i - \bar{x})\,u_i}{\text{SST}_x} = \beta_1 + \frac{\sum(x_i - \bar{x})\,u_i}{\text{SST}_x}\]

\[\boxed{\hat{\beta}_1 = \beta_1 + \sum_{i=1}^{n} w_i u_i, \qquad w_i = \frac{x_i - \bar{x}}{\text{SST}_x}}\]

Interpretation: $\hat{\beta}_1$ equals the true parameter $\beta_1$ plus a weighted sum of the unobserved errors $u_1, \ldots, u_n$. The weights $w_i$ depend only on the $x$-values, so conditional on $\mathbf{x}$ they are constants. The only source of randomness in $\hat{\beta}_1$ is the errors $u_i$.

8.2 Block 2: Deriving and Computing $\text{Var}(\hat{\beta}_1)$ (12 min)

Question 1 (Derivation and numerical computation)

(a) Starting from the representation $\hat{\beta}_1 = \beta_1 + \sum_{i=1}^{n} w_i u_i$, derive $\text{Var}(\hat{\beta}_1 \mid \mathbf{x}) = \sigma^2/\text{SST}_x$.

Hint: Follow these steps — (i) Why can you drop $\beta_1$ from the variance? (ii) Expand $\text{Var}(\sum w_i u_i \mid \mathbf{x})$. Which assumption eliminates the covariance terms? (iii) Which assumption makes $\text{Var}(u_i \mid \mathbf{x})$ the same for all $i$? (iv) Simplify $\sum w_i^2$.

Solution

We start from: \[\hat{\beta}_1 = \beta_1 + \sum_{i=1}^{n} w_i u_i \qquad\text{where}\quad w_i = \frac{x_i - \bar{x}}{\text{SST}_x}\]

Step (i): Drop the constant $\beta_1$.

By SLR.1 (Linear in Parameters), the model is $y = \beta_0 + \beta_1 x + u$, where $\beta_0$ and $\beta_1$ are fixed, unknown population parameters — they are constants, not random variables. Since adding a constant to a random variable does not change its variance (recall: $\text{Var}(a + X) = \text{Var}(X)$ for any constant $a$):

\[\text{Var}(\hat{\beta}_1 \mid \mathbf{x}) = \text{Var}\!\left(\beta_1 + \sum_{i=1}^n w_i u_i \;\middle|\; \mathbf{x}\right) = \text{Var}\!\left(\sum_{i=1}^n w_i u_i \;\middle|\; \mathbf{x}\right)\]

Note also that we condition on $\mathbf{x} = (x_1, \ldots, x_n)$, so the weights $w_i = (x_i - \bar{x})/\text{SST}_x$ are treated as constants (they depend only on the $x$-values). The only random variables in this expression are $u_1, \ldots, u_n$.

Step (ii): Expand the variance of the weighted sum.

By the general formula for the variance of a linear combination:

\[\text{Var}\!\left(\sum_{i=1}^n w_i u_i \;\middle|\; \mathbf{x}\right) = \underbrace{\sum_{i=1}^n w_i^2\,\text{Var}(u_i \mid \mathbf{x})}_{\text{variance terms}} + \underbrace{\sum_{\substack{i,j=1 \\ i \ne j}}^{n} w_i w_j\,\text{Cov}(u_i, u_j \mid \mathbf{x})}_{\text{covariance terms}}\]

Now we apply SLR.2 (Random Sampling): the observations $\{(x_i, y_i)\}_{i=1}^n$ are drawn independently from the population. Since $y_i = \beta_0 + \beta_1 x_i + u_i$, the errors $u_1, \ldots, u_n$ are also independent of each other conditional on $\mathbf{x}$.

Independence implies that $\text{Cov}(u_i, u_j \mid \mathbf{x}) = 0$ for all $i \ne j$. Therefore, all cross-terms vanish:

\[= \sum_{i=1}^n w_i^2\,\text{Var}(u_i \mid \mathbf{x}) + 0 = \sum_{i=1}^n w_i^2\,\text{Var}(u_i \mid \mathbf{x})\]

Step (iii): Apply homoskedasticity.

Now apply SLR.5 (Homoskedasticity): the variance of the error term is the same for all observations, regardless of the value of $x$:

\[\text{Var}(u_i \mid \mathbf{x}) = \sigma^2 \quad\text{for all } i = 1, \ldots, n\]

Since $\sigma^2$ does not depend on $i$, it can be factored out of the sum:

\[\sum_{i=1}^n w_i^2\,\text{Var}(u_i \mid \mathbf{x}) = \sum_{i=1}^n w_i^2 \cdot \sigma^2 = \sigma^2 \sum_{i=1}^n w_i^2\]

Why SLR.5 is critical: Without homoskedasticity, each observation could have a different error variance $\text{Var}(u_i \mid x_i) = \sigma_i^2$, and we could not factor out a single $\sigma^2$. The formula would become $\sum w_i^2 \sigma_i^2$, which depends on each individual $\sigma_i^2$ and is much harder to work with. This is exactly the complication that arises under heteroskedasticity.

Step (iv): Simplify $\sum w_i^2$.

Recall that $w_i = (x_i - \bar{x})/\text{SST}_x$. Squaring and summing:

\[\sum_{i=1}^n w_i^2 = \sum_{i=1}^n \frac{(x_i - \bar{x})^2}{\text{SST}_x^2} = \frac{1}{\text{SST}_x^2} \sum_{i=1}^n (x_i - \bar{x})^2 = \frac{\text{SST}_x}{\text{SST}_x^2} = \frac{1}{\text{SST}_x}\]

In the third equality, we used the definition $\text{SST}_x = \sum_{i=1}^n (x_i - \bar{x})^2$. Note that SLR.3 (Sample Variation in $x$) guarantees $\text{SST}_x > 0$, so this division is valid.

Combining all four steps:

\[\text{Var}(\hat{\beta}_1 \mid \mathbf{x}) \underset{\text{(i)}}{=} \text{Var}\!\left(\sum w_i u_i \mid \mathbf{x}\right) \underset{\text{(ii)}}{=} \sum w_i^2\,\text{Var}(u_i \mid \mathbf{x}) \underset{\text{(iii)}}{=} \sigma^2 \sum w_i^2 \underset{\text{(iv)}}{=} \sigma^2 \cdot \frac{1}{\text{SST}_x}\]

\[\boxed{\text{Var}(\hat{\beta}_1 \mid \mathbf{x}) = \frac{\sigma^2}{\text{SST}_x}} \qquad\square\]

Summary of assumptions used:

SLR.1 (Linear in Parameters): $\beta_1$ is a constant, so it drops from the variance.
SLR.2 (Random Sampling): errors are independent, so covariance terms are zero.
SLR.3 (Sample Variation): $\text{SST}_x > 0$, so division is valid.
SLR.5 (Homoskedasticity): all $\text{Var}(u_i \mid \mathbf{x}) = \sigma^2$, so $\sigma^2$ factors out.

Note: SLR.4 (Zero Conditional Mean) is not needed for the variance formula — it was needed for unbiasedness but not here.

(b) Suppose $n = 5$ observations with $x$-values $\{1, 3, 5, 7, 9\}$ and $\sigma^2 = 10$. Compute $\bar{x}$, $\text{SST}_x$, $\text{Var}(\hat{\beta}_1)$, and $\text{sd}(\hat{\beta}_1)$.

Solution

Step 1: Compute $\bar{x}$. \[\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i = \frac{1 + 3 + 5 + 7 + 9}{5} = \frac{25}{5} = 5\]

Step 2: Compute each deviation $(x_i - \bar{x})$ and its square.

$x_i$	$x_i - \bar{x}$	$(x_i - \bar{x})^2$
1	$1 - 5 = -4$	$16$
3	$3 - 5 = -2$	$4$
5	$5 - 5 = 0$	$0$
7	$7 - 5 = 2$	$4$
9	$9 - 5 = 4$	$16$
		$\text{SST}_x = 40$

Step 3: Apply the formula. \[\text{Var}(\hat{\beta}_1) = \frac{\sigma^2}{\text{SST}_x} = \frac{10}{40} = \boxed{0.25}\]

\[\text{sd}(\hat{\beta}_1) = \sqrt{\text{Var}(\hat{\beta}_1)} = \sqrt{0.25} = \boxed{0.5}\]

Interpretation: Across repeated samples with these same $x$-values, the OLS slope estimate $\hat{\beta}_1$ would have a standard deviation of $0.5$ — on average, our estimates will be about $0.5$ units away from the true $\beta_1$.

(c) Based on the formula $\text{Var}(\hat{\beta}_1) = \sigma^2/\text{SST}_x$, explain what happens to the precision of $\hat{\beta}_1$ when: (i) the error variance $\sigma^2$ increases; (ii) we add more observations that are spread out (not concentrated at $\bar{x}$); (iii) we add observations that are all equal to $\bar{x}$.

Solution

The formula $\text{Var}(\hat{\beta}_1) = \sigma^2/\text{SST}_x$ has two ingredients: $\sigma^2$ in the numerator and $\text{SST}_x$ in the denominator.

(i) $\sigma^2$ increases $\Longrightarrow$ Var increases (less precise).

$\sigma^2$ measures the noise in the error term. More noise means the data points are more scattered around the regression line, making it harder to pin down the slope.

Numerical example: With our data ($\text{SST}_x = 40$), doubling $\sigma^2$ from 10 to 20 doubles $\text{Var}(\hat{\beta}_1)$ from $0.25$ to $0.50$.

(ii) Adding spread-out observations increases $\text{SST}_x$ $\Longrightarrow$ Var decreases (more precise).

Each new observation at $x_i \ne \bar{x}$ contributes $(x_i - \bar{x})^2 > 0$ to $\text{SST}_x$. A larger $\text{SST}_x$ in the denominator shrinks $\text{Var}(\hat{\beta}_1)$.

Intuition: More variation in $x$ gives us more “leverage” to estimate the slope. Imagine fitting a line through data points that are all bunched together vs. spread far apart — the spread-out data pins the slope down much more precisely.

(iii) Adding observations at $x = \bar{x}$ does not help.

If $x_i = \bar{x}$, then $(x_i - \bar{x})^2 = 0$, so these observations contribute nothing to $\text{SST}_x$. $\text{Var}(\hat{\beta}_1)$ is unchanged.

Numerical example: Adding 100 observations all at $x = 5$ (the mean) keeps $\text{SST}_x = 40$, so $\text{Var}(\hat{\beta}_1)$ stays at $0.25$.

Intuition: To estimate a slope, you need to see how $y$ changes as $x$ changes. If all your new data have the same $x$, you learn nothing new about the slope — you only learn more about the intercept.

8.3 Block 3: From $\sigma^2$ to Standard Errors (10 min)

The formula $\text{Var}(\hat{\beta}_1) = \sigma^2/\text{SST}_x$ involves $\sigma^2 = \text{Var}(u \mid \mathbf{x})$, which is unknown (we never observe the true errors $u_i$). We estimate it from the residuals $\hat{u}_i = y_i - \hat{y}_i$:

\[\hat{\sigma}^2 = \frac{1}{n-2}\sum_{i=1}^{n} \hat{u}_i^2 = \frac{\text{SSR}}{n-2}\]

Why $n-2$ and not $n$? We estimated two parameters ($\hat{\beta}_0$ and $\hat{\beta}_1$) to compute the residuals. This “uses up” 2 degrees of freedom. Dividing by $n-2$ corrects for this and makes $\hat{\sigma}^2$ an unbiased estimator of $\sigma^2$: $E[\hat{\sigma}^2] = \sigma^2$.

The standard error of $\hat{\beta}_1$ replaces $\sigma$ with $\hat{\sigma}$:

\[\text{SE}(\hat{\beta}_1) = \frac{\hat{\sigma}}{\sqrt{\text{SST}_x}}\]

In R, $\hat{\sigma}$ is reported as Residual standard error and $\text{SE}(\hat{\beta}_1)$ appears in the Std. Error column.

Adding SLR.6 (Normality: $u \mid x \sim N(0, \sigma^2)$), the $t$-statistic

\[t = \frac{\hat{\beta}_1 - \beta_1}{\text{SE}(\hat{\beta}_1)} \sim t_{n-2}\]

follows a $t$-distribution with $n - 2$ degrees of freedom. The $t$ (not $N(0,1)$) arises because $\hat{\sigma}$ in the denominator is itself a random variable — it varies from sample to sample, adding extra uncertainty.

Key distinction:

If $\sigma$ were known: $Z = \frac{\hat{\beta}_1 - \beta_1}{\sigma/\sqrt{\text{SST}_x}} \sim N(0,1)$ (standard normal)
Since $\sigma$ is unknown: $t = \frac{\hat{\beta}_1 - \beta_1}{\hat{\sigma}/\sqrt{\text{SST}_x}} \sim t_{n-2}$ ($t$-distribution, heavier tails)

As $n \to \infty$, $\hat{\sigma} \to \sigma$ and $t_{n-2} \to N(0,1)$, so the distinction vanishes in large samples.

The following R output will be used for Questions 2 and 3. An econometrician regressed y on x using 22 observations.

> model <- lm(y ~ x, data = mydata)
> summary(model)
...
Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   5.2000     1.3000   4.000 0.000742 ***
x             2.4000     0.8000       A        B
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4 on 20 degrees of freedom

> confint(model)
                2.5 %   97.5 %
(Intercept)  2.488248 7.911752
x                   C        D

Question 2 (Reading R output — variance and standard errors)

(a) From the R output, identify $\hat{\sigma}$ (the residual standard error) and the number of observations $n$. Then compute $\hat{\sigma}^2$ and $\text{SST}_x$.

Hint: Use $\text{SE}(\hat{\beta}_1) = \hat{\sigma}/\sqrt{\text{SST}_x}$ and solve for $\text{SST}_x$.

Solution

Reading $\hat{\sigma}$ from the output: The line Residual standard error: 4 on 20 degrees of freedom tells us two things:

$\hat{\sigma} = 4$ (the estimated standard deviation of the errors)
Degrees of freedom $= n - 2 = 20$, which means $n = 22$ observations.

Computing $\hat{\sigma}^2$: \[\hat{\sigma}^2 = 4^2 = \boxed{16}\]

Computing $\text{SST}_x$: From the output, the Std. Error column for x gives $\text{SE}(\hat{\beta}_1) = 0.8$. Using the formula $\text{SE}(\hat{\beta}_1) = \hat{\sigma}/\sqrt{\text{SST}_x}$:

\[0.8 = \frac{4}{\sqrt{\text{SST}_x}} \quad\Longrightarrow\quad \sqrt{\text{SST}_x} = \frac{4}{0.8} = 5 \quad\Longrightarrow\quad \text{SST}_x = 5^2 = \boxed{25}\]

Verification: $\text{SE}(\hat{\beta}_1) = \hat{\sigma}/\sqrt{\text{SST}_x} = 4/\sqrt{25} = 4/5 = 0.8$ $\checkmark$

(b) Explain in one or two sentences why replacing $\sigma$ with $\hat{\sigma}$ changes the distribution of the test statistic from standard normal to $t$.

Solution

When $\sigma$ is known, $(\hat{\beta}_1 - \beta_1)/(\sigma/\sqrt{\text{SST}_x})$ is a ratio of a normal random variable (the numerator) over a constant (the denominator), so the ratio is exactly $N(0,1)$.

When we replace $\sigma$ with $\hat{\sigma}$, the denominator $\hat{\sigma}/\sqrt{\text{SST}_x}$ becomes a random variable — it fluctuates from sample to sample because $\hat{\sigma}$ is computed from the data. The ratio is now a normal random variable divided by a random estimate of its standard deviation. This ratio has heavier tails than the normal (sometimes $\hat{\sigma}$ underestimates $\sigma$, inflating $|t|$), producing the $t_{n-2}$ distribution.

Intuition: The $t$-distribution is “wider” than the normal to account for the fact that we don’t know $\sigma$ exactly. As the sample size grows, $\hat{\sigma}$ becomes a better and better estimate of $\sigma$, the extra uncertainty shrinks, and $t_{n-2} \to N(0,1)$.

(c) Find $A$ (the $t$-value) and $B$ (the $p$-value) in the output. What null hypothesis does this $t$-statistic test?

Solution

What R reports by default: The $t$-value and $p$-value in R’s summary() output always test the null hypothesis

\[H_0\colon \beta_1 = 0 \quad\text{vs.}\quad H_1\colon \beta_1 \ne 0\]

That is, R tests whether the coefficient is significantly different from zero (two-sided).

Finding $A$ (t-value): The $t$-statistic for testing $H_0\colon \beta_1 = 0$ is:

\[A = t = \frac{\hat{\beta}_1 - 0}{\text{SE}(\hat{\beta}_1)} = \frac{\text{Estimate}}{\text{Std. Error}} = \frac{2.4000}{0.8000} = \boxed{3.000}\]

Finding $B$ (p-value): The $p$-value is the probability of observing a $t$-statistic as extreme as $|t| = 3$ or more, if $H_0$ were true:

\[B = P(|t_{20}| > 3) = 2 \cdot P(t_{20} > 3)\]

The factor of 2 appears because this is a two-sided test (we count both tails). In R: 2 * pt(-3, df = 20) $\approx \boxed{0.007}$.

Meaning: If $\beta_1$ were truly zero, there would be only a $0.7\%$ chance of observing a $t$-statistic as large as $3$ in absolute value. This is strong evidence against $H_0$.

8.4 Block 4: Hypothesis Testing (13 min)

Two-sided test of $H_0\colon \beta_1 = \beta_{1,0}$ vs. $H_1\colon \beta_1 \ne \beta_{1,0}$:

State the hypotheses ($H_0$ and $H_1$) and significance level $\alpha$.
Compute the test statistic: $t = (\hat{\beta}_1 - \beta_{1,0})/\text{SE}(\hat{\beta}_1)$.
Find the critical value: $t_{\alpha/2,\, n-2}$.
Decision rule: Reject $H_0$ if $|t| > t_{\alpha/2,\, n-2}$.
State the conclusion in context.

One-sided test of $H_0\colon \beta_1 \le \beta_{1,0}$ vs. $H_1\colon \beta_1 > \beta_{1,0}$:

Same $t$-statistic.
Reject $H_0$ if $t > t_{\alpha,\, n-2}$ (entire $\alpha$ in one tail).
Since $t_{\alpha,\, n-2} < t_{\alpha/2,\, n-2}$, it is easier to reject in the specified direction.

$(1-\alpha)$ Confidence interval: $\hat{\beta}_1 \pm t_{\alpha/2,\, n-2} \cdot \text{SE}(\hat{\beta}_1)$.

CI–test equivalence (two-sided only): Reject $H_0\colon \beta_1 = \beta_{1,0}$ at level $\alpha$ if and only if $\beta_{1,0}$ falls outside the $(1-\alpha)$ CI.

Use the R output from Block 3. You may use the following critical values for the $t_{20}$ distribution: $t_{0.025,\, 20} = 2.086$ (in R: qt(0.975, 20)) and $t_{0.05,\, 20} = 1.725$ (in R: qt(0.95, 20)).

Question 3 (Hypothesis testing and confidence intervals)

(a) Using $A$ and $B$ from Question 2, test $H_0\colon \beta_1 = 0$ against $H_1\colon \beta_1 \ne 0$ at the 5% significance level. State your conclusion.

Solution

Step 1: State hypotheses and significance level.

$H_0\colon \beta_1 = 0$ (the variable $x$ has no effect on $y$)
$H_1\colon \beta_1 \ne 0$ (the variable $x$ has some effect on $y$)
This is a two-sided test at $\alpha = 0.05$.

Step 2: Test statistic.

From Q2(c): $t = A = 3.000$.

Step 3: Critical value.

For a two-sided test at $\alpha = 0.05$ with $\text{df} = 20$: $t_{0.025,\, 20} = 2.086$.

Step 4: Decision.

Reject $H_0$ if $|t| > 2.086$. Since $|t| = 3.000 > 2.086$, we reject $H_0$.

Alternative (using the $p$-value): Reject $H_0$ if $p < \alpha$. Since $p = B = 0.007 < 0.05 = \alpha$, we reject $H_0$. $\checkmark$

Step 5: Conclusion.

At the 5% significance level, we reject the null hypothesis that $\beta_1 = 0$. There is statistically significant evidence that $x$ has an effect on $y$.

(b) Find $C$ and $D$ (the 95% confidence interval for $\beta_1$). Show your work.

Solution

Step 1: Formula.

The $(1-\alpha) = 95\%$ confidence interval for $\beta_1$ is: \[\hat{\beta}_1 \pm t_{\alpha/2,\, n-2} \cdot \text{SE}(\hat{\beta}_1)\]

Step 2: Plug in values.

$\hat{\beta}_1 = 2.4$ (from the “Estimate” column)
$t_{0.025,\, 20} = 2.086$ (critical value for 95% CI with 20 df)
$\text{SE}(\hat{\beta}_1) = 0.8$ (from the “Std. Error” column)

Step 3: Compute the margin of error. \[\text{Margin of error} = t_{0.025,\, 20} \times \text{SE}(\hat{\beta}_1) = 2.086 \times 0.8 = 1.669\]

Step 4: Compute the bounds. \[\begin{aligned} C = \text{Lower bound} &= \hat{\beta}_1 - \text{Margin of error} = 2.4 - 1.669 = \boxed{0.731} \\ D = \text{Upper bound} &= \hat{\beta}_1 + \text{Margin of error} = 2.4 + 1.669 = \boxed{4.069} \end{aligned}\]

Step 5: State the interval.

The 95% CI for $\beta_1$ is $[0.731,\; 4.069]$.

Interpretation: We are 95% confident that the true $\beta_1$ lies between $0.731$ and $4.069$. This means that if we repeated the sampling process many times and computed a 95% CI each time, approximately 95% of those intervals would contain the true $\beta_1$.

Consistency check with part (a): The CI does not contain $0$, which is consistent with our rejection of $H_0\colon \beta_1 = 0$ at 5%.

(c) Using the confidence interval from (b), test $H_0\colon \beta_1 = 1$ against $H_1\colon \beta_1 \ne 1$ at the 5% significance level.

Solution

Step 1: State hypotheses.

$H_0\colon \beta_1 = 1$
$H_1\colon \beta_1 \ne 1$
Two-sided test at $\alpha = 0.05$.

Step 2: Apply the CI–test equivalence.

For a two-sided test, we reject $H_0\colon \beta_1 = \beta_{1,0}$ at level $\alpha$ if and only if the hypothesized value $\beta_{1,0}$ falls outside the $(1-\alpha)$ CI. From part (b), the 95% CI is $[0.731,\; 4.069]$.

Step 3: Check whether $\beta_{1,0} = 1$ is inside or outside the CI.

Since $0.731 < 1 < 4.069$, the value $\beta_{1,0} = 1$ is inside the 95% CI.

Step 4: Decision.

We fail to reject $H_0$ at the 5% significance level. There is not enough evidence to conclude that $\beta_1 \ne 1$.

Verification via $t$-test: \[t = \frac{\hat{\beta}_1 - 1}{\text{SE}(\hat{\beta}_1)} = \frac{2.4 - 1}{0.8} = \frac{1.4}{0.8} = 1.75\]

Decision: $|t| = 1.75 < 2.086 = t_{0.025,20}$, so we fail to reject. $\checkmark$

Both methods (CI approach and $t$-test) give the same answer — they are mathematically equivalent for two-sided tests.

(d) Now test $H_0\colon \beta_1 \le 1$ against $H_1\colon \beta_1 > 1$ at the 5% significance level.

Solution

Step 1: State hypotheses and significance level.

$H_0\colon \beta_1 \le 1$
$H_1\colon \beta_1 > 1$
This is a one-sided (right-tailed) test at $\alpha = 0.05$.

Step 2: Compute the test statistic.

The $t$-statistic uses the boundary value of $H_0$ (i.e., $\beta_{1,0} = 1$): \[t = \frac{\hat{\beta}_1 - \beta_{1,0}}{\text{SE}(\hat{\beta}_1)} = \frac{2.4 - 1}{0.8} = \frac{1.4}{0.8} = 1.75\]

This is the same $t$-statistic as in part (c). The difference will be in the critical value.

Step 3: Find the critical value.

For a one-sided test at $\alpha = 0.05$ with $\text{df} = 20$: $t_{0.05,\, 20} = 1.725$.

Notice this is smaller than the two-sided critical value ($2.086$). The one-sided test places the entire 5% rejection probability in one tail instead of splitting it as 2.5% in each tail.

Step 4: Decision rule and decision.

For a right-tailed test: Reject $H_0$ if $t > t_{0.05, 20} = 1.725$.

Since $t = 1.75 > 1.725$, we reject $H_0$.

Step 5: Conclusion.

At the 5% significance level, we reject $H_0\colon \beta_1 \le 1$ in favor of $H_1\colon \beta_1 > 1$. There is statistically significant evidence that $\beta_1 > 1$.

(e) Parts (c) and (d) test closely related hypotheses about $\beta_1 = 1$, yet give opposite conclusions. Explain why.

Solution

The key: different critical values.

Test	$t$-stat	Critical value	Exceeds CV?	Decision
Two-sided (c)	$1.75$	$t_{0.025,20} = 2.086$	$1.75 < 2.086$	Fail to reject
One-sided (d)	$1.75$	$t_{0.05,20} = 1.725$	$1.75 > 1.725$	Reject

Why the critical values differ:

The two-sided test splits the 5% significance level across both tails: 2.5% in the left tail, 2.5% in the right tail. The critical value $t_{0.025,20} = 2.086$ is large because it must capture only 2.5% in each tail.
The one-sided test places the entire 5% in the right tail (the direction of $H_1\colon \beta_1 > 1$). The critical value $t_{0.05,20} = 1.725$ is smaller because it captures 5% in one tail.

Since $t = 1.75$ falls between the two critical values ($1.725 < 1.75 < 2.086$), it is large enough to reject in the one-sided test but not large enough to reject in the two-sided test.

The trade-off:

The one-sided test has more power (higher probability of rejecting $H_0$) when the true $\beta_1$ lies in the direction of $H_1$ (here, $\beta_1 > 1$). It achieves this by concentrating all its rejection region on one side.
However, the one-sided test has zero power against departures in the opposite direction ($\beta_1 < 1$). If $\beta_1$ were actually much less than 1, the one-sided test would never reject.
The two-sided test can detect departures in either direction, but at the cost of lower power in any single direction.

When to use one-sided: Only when you have strong theoretical reasons to expect the effect in a specific direction before seeing the data. Otherwise, use a two-sided test.

8.5 Block 5: Type I/II Errors, Size, and Power (12 min)

When we perform a hypothesis test, we make a decision (reject or fail to reject $H_0$) based on sample data. Since the data are random, we can make mistakes:

	$H_0$ is actually true	$H_0$ is actually false
Reject $H_0$	Type I Error	Correct (Power)
Fail to reject $H_0$	Correct	Type II Error

Type I Error (false positive): Rejecting $H_0$ when $H_0$ is true. We conclude there is an effect, but in reality there is none.
Type II Error (false negative): Failing to reject $H_0$ when $H_0$ is false. We miss a real effect.

Example 1 — Courtroom trial:

$H_0$: The defendant is innocent.
$H_1$: The defendant is guilty.
Type I: Convicting an innocent person. The jury finds them guilty, but they didn’t commit the crime.
Type II: Acquitting a guilty person. The jury finds them not guilty, but they actually did it.
The “beyond reasonable doubt” standard sets a very low $\alpha$ — society considers convicting an innocent person (Type I) worse than letting a guilty person go free (Type II).

Example 2 — Drug trial:

$H_0\colon \beta_1 = 0$ (the new drug has no effect on blood pressure).
$H_1\colon \beta_1 \ne 0$ (the drug has an effect).
Type I: The trial concludes the drug works (rejects $H_0$), but in reality it has no effect. A useless drug is marketed.
Type II: The trial fails to find an effect (fails to reject $H_0$), but the drug actually works. A beneficial drug is never brought to market.
$\alpha = 0.05$ means: if the drug truly has no effect, there is at most a 5% chance we mistakenly conclude it works.

Size $= P(\text{reject } H_0 \mid H_0 \text{ is true}) = P(\text{Type I Error}) = \alpha$.

The size is the probability of making a Type I error. By construction, a test at significance level $\alpha$ has size $\alpha$: we choose the critical value to make this probability exactly $\alpha$.

Power $= P(\text{reject } H_0 \mid H_0 \text{ is false}) = 1 - P(\text{Type II Error})$.

The power is the probability of correctly detecting a real effect. It depends on the true value $\beta^*$: the farther $\beta^*$ is from $\beta_0$ (the null value), the higher the power.

What we want: Low Type I error ($\alpha$ small) AND high power ($1 - \text{Type II}$ large). But these are in tension: making $\alpha$ smaller (harder to reject) also reduces power. The standard compromise is $\alpha = 0.05$ and hoping for power $\ge 0.80$.

Question 4 (Type I/II errors and power — numerical exercise)

To simplify the calculations, assume $\sigma_{\hat{\beta}}$ is known, so the test statistic is $Z = (\hat{\beta} - \beta_0)/\sigma_{\hat{\beta}} \sim N(0,1)$ under $H_0$ (standard normal, not $t$).

Consider testing $H_0\colon \beta = 0$ vs. $H_1\colon \beta \ne 0$ at $\alpha = 0.05$, with $\sigma_{\hat{\beta}} = 2$.

(a) Define Type I and Type II error in the context of this test. Give a concrete interpretation if $\beta$ measures the effect of a job training program on wages (in thousands of dollars).

Solution

Type I Error $= P(\text{reject } H_0 \mid H_0 \text{ is true})$:

We reject $H_0\colon \beta = 0$ even though $\beta$ is actually zero. In context: we conclude that the job training program raises wages, but in reality the program has no effect. We waste resources implementing a useless program.

Type II Error $= P(\text{fail to reject } H_0 \mid H_0 \text{ is false})$:

We fail to reject $H_0\colon \beta = 0$ even though $\beta \ne 0$ (the program actually works). In context: we conclude that the job training program has no effect, and we cancel a program that actually helps workers. The real benefit goes undetected.

(b) The two-sided test rejects $H_0$ when $|Z| > 1.96$. Show that the size of this test is exactly $0.05$.

Solution

The size is the probability of rejecting $H_0$ when $H_0$ is true. Under $H_0\colon \beta = 0$:

\[Z = \frac{\hat{\beta} - 0}{\sigma_{\hat{\beta}}} = \frac{\hat{\beta}}{2} \sim N(0, 1)\]

The test rejects when $|Z| > 1.96$. So:

\[\begin{aligned} \text{Size} &= P(|Z| > 1.96 \mid H_0) \\ &= P(Z > 1.96) + P(Z < -1.96) & &\text{(split into two tails)}\\ &= (1 - \Phi(1.96)) + \Phi(-1.96) & &\text{(using the normal CDF)} \\ &= 0.025 + 0.025 & &\text{(by symmetry of the normal)} \\ &= \boxed{0.05} \end{aligned}\]

This confirms that the critical value $z_{0.025} = 1.96$ was chosen to make the Type I error probability exactly $\alpha = 0.05$. That is the definition of a test at significance level $\alpha$.

(c) Compute the power of this test when the true value is $\beta^* = 6$ (the program truly raises wages by $6,000).

Hint: Under $\beta^* = 6$, $\hat{\beta} \sim N(6, 4)$, so $Z = \hat{\beta}/2 \sim N(3, 1)$. Compute $P(|Z| > 1.96)$ using the substitution $W = Z - 3 \sim N(0,1)$. You may use: $\Phi(1.04) = 0.851$.

Solution

Step 1: Distribution of $Z$ under the true $\beta^*$.

If $\beta^* = 6$, then $\hat{\beta} \sim N(\beta^*, \sigma_{\hat{\beta}}^2) = N(6, 4)$. The test statistic (computed as if $H_0$ were true) is:

\[Z = \frac{\hat{\beta} - 0}{2} = \frac{\hat{\beta}}{2} \sim N\!\left(\frac{6}{2},\, 1\right) = N(3, 1)\]

So under $\beta^* = 6$, $Z$ is not $N(0,1)$ but rather $N(3, 1)$ — it is shifted to the right by $\delta = \beta^*/\sigma_{\hat{\beta}} = 6/2 = 3$.

Step 2: Compute power $= P(|Z| > 1.96)$ under $Z \sim N(3, 1)$.

Split into two tails: \[\text{Power} = P(Z > 1.96) + P(Z < -1.96)\]

Substitute $W = Z - 3 \sim N(0,1)$, so $Z = W + 3$:

Right tail: \[P(Z > 1.96) = P(W + 3 > 1.96) = P(W > 1.96 - 3) = P(W > -1.04) = \Phi(1.04) = 0.851\]

Left tail: \[P(Z < -1.96) = P(W + 3 < -1.96) = P(W < -1.96 - 3) = P(W < -4.96) = \Phi(-4.96) \approx 0.000\]

Combining: \[\text{Power} = 0.851 + 0.000 = \boxed{0.851}\]

Interpretation: If the job training program truly raises wages by $6,000 (with $\sigma_{\hat{\beta}} = 2$), there is an 85.1% probability that our test will correctly detect the effect and reject $H_0$. This is good power.

(d) Compute the power when $\beta^* = 2$ (the program raises wages by only $2,000). Compare with part (c) and explain.

Hint: $\delta = 2/2 = 1$, so $Z \sim N(1,1)$. You may use: $\Phi(0.96) = 0.831$.

Solution

Step 1: Distribution of $Z$ under $\beta^* = 2$. \[Z = \frac{\hat{\beta}}{2} \sim N\!\left(\frac{2}{2},\, 1\right) = N(1, 1)\]

Now $\delta = 1$ (the shift from the null is smaller).

Step 2: Compute power.

With $W = Z - 1 \sim N(0,1)$:

Right tail: \[P(Z > 1.96) = P(W > 0.96) = 1 - \Phi(0.96) = 1 - 0.831 = 0.169\]

Left tail: \[P(Z < -1.96) = P(W < -2.96) = \Phi(-2.96) \approx 0.002\]

Combining: \[\text{Power} = 0.169 + 0.002 = \boxed{0.171}\]

Comparison:

True $\beta^*$	$\delta = \beta^*/\sigma_{\hat{\beta}}$	Power	Detect?
$6$	$3$	$0.851$	Likely yes
$2$	$1$	$0.171$	Likely no

Why the power is so different: Power depends on how far the true $\beta^*$ is from the null value ($\beta_0 = 0$), measured in units of $\sigma_{\hat{\beta}}$. This ratio $\delta = \beta^*/\sigma_{\hat{\beta}}$ is the “signal-to-noise ratio.”

When $\beta^* = 6$: the signal ($\delta = 3$) is strong relative to the noise. The distribution of $Z$ is shifted far enough from zero that it almost always exceeds the critical value $1.96$. The test detects the effect 85% of the time.
When $\beta^* = 2$: the signal ($\delta = 1$) is weak. The distribution of $Z$ overlaps heavily with the rejection region. The test misses the effect about 83% of the time (Type II error probability $= 1 - 0.171 = 0.829$).

Implication: Small effects are hard to detect unless $\sigma_{\hat{\beta}}$ is small (which requires large $n$ or large $\text{SST}_x$). This connects back to Block 2: increasing $\text{SST}_x$ reduces $\text{Var}(\hat{\beta}_1) = \sigma^2/\text{SST}_x$, which reduces $\sigma_{\hat{\beta}}$, which increases $\delta$, which increases power.

8.6 Summary of Key Formulas

Concept	Formula
Total variation in $x$	$\text{SST}_x = \sum_{i=1}^n (x_i - \bar{x})^2$
Variance of $\hat{\beta}_1$ (Thm. 2.2)	$\text{Var}(\hat{\beta}_1 \mid \mathbf{x}) = \sigma^2 / \text{SST}_x$ [uses SLR.1–5]
Estimated error variance	$\hat{\sigma}^2 = \sum \hat{u}_i^2 / (n-2)$
Standard error of $\hat{\beta}_1$	$\text{SE}(\hat{\beta}_1) = \hat{\sigma} / \sqrt{\text{SST}_x}$
$t$-statistic	$t = (\hat{\beta}_1 - \beta_{1,0}) / \text{SE}(\hat{\beta}_1) \sim t_{n-2}$ [uses SLR.1–6]
Two-sided rejection rule	Reject if $\\|t\\| > t_{\alpha/2,\, n-2}$
One-sided rejection rule	Reject if $t > t_{\alpha,\, n-2}$ (for $H_1\colon \beta_1 > \beta_{1,0}$)
$(1-\alpha)$ Confidence interval	$\hat{\beta}_1 \pm t_{\alpha/2,\, n-2} \cdot \text{SE}(\hat{\beta}_1)$
CI–test equivalence	Reject two-sided $H_0$ iff $\beta_{1,0} \notin$ CI
Type I Error (size)	$P(\text{reject } H_0 \mid H_0 \text{ true}) = \alpha$
Type II Error	$P(\text{fail to reject } H_0 \mid H_0 \text{ false})$
Power	$P(\text{reject } H_0 \mid H_0 \text{ false}) = 1 - P(\text{Type II})$

7 Tutorial 5: Potential Outcomes, Causality, and Counterfactual Reasoning

9 Tutorial 7: Type I & II Errors, Size, and Power

\(x_i\)	\(x_i - \bar{x}\)	\((x_i - \bar{x})^2\)
1	\(1 - 5 = -4\)	\(16\)
3	\(3 - 5 = -2\)	\(4\)
5	\(5 - 5 = 0\)	\(0\)
7	\(7 - 5 = 2\)	\(4\)
9	\(9 - 5 = 4\)	\(16\)
		\(\text{SST}_x = 40\)

True \(\beta^*\)	\(\delta = \beta^*/\sigma_{\hat{\beta}}\)	Power	Detect?
\(6\)	\(3\)	\(0.851\)	Likely yes
\(2\)	\(1\)	\(0.171\)	Likely no

Concept	Formula
Total variation in \(x\)	\(\text{SST}_x = \sum_{i=1}^n (x_i - \bar{x})^2\)
Variance of \(\hat{\beta}_1\) (Thm. 2.2)	\(\text{Var}(\hat{\beta}_1 \mid \mathbf{x}) = \sigma^2 / \text{SST}_x\) [uses SLR.1–5]
Estimated error variance	\(\hat{\sigma}^2 = \sum \hat{u}_i^2 / (n-2)\)
Standard error of \(\hat{\beta}_1\)	\(\text{SE}(\hat{\beta}_1) = \hat{\sigma} / \sqrt{\text{SST}_x}\)
\(t\)-statistic	\(t = (\hat{\beta}_1 - \beta_{1,0}) / \text{SE}(\hat{\beta}_1) \sim t_{n-2}\) [uses SLR.1–6]
Two-sided rejection rule	Reject if \(\\|t\\| > t_{\alpha/2,\, n-2}\)
One-sided rejection rule	Reject if \(t > t_{\alpha,\, n-2}\) (for \(H_1\colon \beta_1 > \beta_{1,0}\))
\((1-\alpha)\) Confidence interval	\(\hat{\beta}_1 \pm t_{\alpha/2,\, n-2} \cdot \text{SE}(\hat{\beta}_1)\)
CI–test equivalence	Reject two-sided \(H_0\) iff \(\beta_{1,0} \notin\) CI
Type I Error (size)	\(P(\text{reject } H_0 \mid H_0 \text{ true}) = \alpha\)
Type II Error	\(P(\text{fail to reject } H_0 \mid H_0 \text{ false})\)
Power	\(P(\text{reject } H_0 \mid H_0 \text{ false}) = 1 - P(\text{Type II})\)

Introduction to Econometrics

8 Tutorial 6: Variance of the OLS estimator and hypothesis testing

8.1 Block 1: Motivation — Why Isn’t Unbiasedness Enough? (3 min)

8.2 Block 2: Deriving and Computing \(\text{Var}(\hat{\beta}_1)\) (12 min)

8.3 Block 3: From \(\sigma^2\) to Standard Errors (10 min)

8.4 Block 4: Hypothesis Testing (13 min)

8.5 Block 5: Type I/II Errors, Size, and Power (12 min)

8.6 Summary of Key Formulas

Label	Name	Statement
SLR.1	Linear in Parameters	\(y = \beta_0 + \beta_1 x + u\)
SLR.2	Random Sampling	\(\{(x_i, y_i)\}_{i=1}^n\) are i.i.d. draws
SLR.3	Sample Variation in \(x\)	\(\text{SST}_x = \sum (x_i - \bar{x})^2 > 0\)
SLR.4	Zero Conditional Mean	\(E[u \mid x] = 0\)
SLR.5	Homoskedasticity	\(\text{Var}(u \mid x) = \sigma^2\) (constant)
SLR.6	Normality	\(u \mid x \sim N(0, \sigma^2)\)

Test	\(t\)-stat	Critical value	Exceeds CV?	Decision
Two-sided (c)	\(1.75\)	\(t_{0.025,20} = 2.086\)	\(1.75 < 2.086\)	Fail to reject
One-sided (d)	\(1.75\)	\(t_{0.05,20} = 1.725\)	\(1.75 > 1.725\)	Reject

	\(H_0\) is actually true	\(H_0\) is actually false
Reject \(H_0\)	Type I Error	Correct (Power)
Fail to reject \(H_0\)	Correct	Type II Error