```
from dsc80_utils import *
```

# Pre-Lecture Review for Lecture 6 – Hypothesis Testing¶

## DSC 80, Spring 2024¶

**This material is review from DSC 10. Since we're going to build off of it in DSC 80 and it's probably been a while since you've done hypothesis testing, make sure to read over this notebook.**

You can also access this notebook by pulling the course GitHub repository and opening `lectures/lec06/pre-lec06.ipynb`

.

In Lecture 6, we'll give a bit more context for *why* we're revisiting hypothesis testing. Here, we'll review the hypothesis testing framework you saw in DSC 10 by walking through a concrete example: coin flipping.

## Example: Coin flipping¶

### Coin flipping¶

Suppose that we find a coin on the ground and aren't sure if it's a fair coin.

We flip it 100 times and see 59 heads and 41 tails. We consider two possibilities:

- The coin is fair, and we just happened to see 59 heads.
- The coin isn't fair, because it's biased in favor of heads.

At a high level, we want to try and answer the question, **how likely is it that we'd see at least 59 heads in 100 flips of a fair coin?**

- If it's rare to see at least 59 heads in 100 flips of a fair coin, then the evidence suggests our coin isn't fair; in this case, we'd
*think*the coin isn't fair. - If it's not that rare to see at least 59 heads in 100 flips of a fair coin, then we can't say our coin isn't fair; in this case, we'd
*think*the coin is fair.

### Setup¶

**Observation**: We flipped a coin 100 times, and saw 59 heads and 41 tails.**Null Hypothesis**: The coin is fair.**Alternative Hypothesis**: The coin is biased in favor of heads.**Test Statistic**: Number of heads, $N_H$.

### Generating the null distribution¶

Now that we've chosen a test statistic, we need to generate the distribution of the test statistic under the assumption the null hypothesis is true, i.e. the

**null distribution**.This distribution will give us, for example:

- The probability of seeing exactly 4 heads in 100 flips of a fair coin.
- The probability of seeing at most 46 heads in 100 flips of a fair coin.
**The probability of seeing at least 59 heads in 100 flips of a fair coin.**

The whole point of generating this distribution is to

**quantify how rare our observation was**.- If the probability of seeing at least 59 heads in 100 flips of a fair coin is large, then our outcome was not that rare.
- If that probability is small, then our outcome was rare.

In the diagram below, let $\theta$ represent a simulated test statistic, and let $\hat{\theta}$ represent the observed statistic (59 in our case).

### Generating the null distribution¶

In this case, we can actually find the null distribution using math.

- The number of heads in $N$ flips of a fair coin follows the $\text{Binomial}(N, 0.5)$ distribution:

$$P(\text{# heads} = k) = {100 \choose k} (0.5)^k{(1-0.5)^{100-k}} = {100 \choose k} 0.5^{100}$$

But, we'll often pick test statistics for which we don't know the true probability distribution. In such cases, we'll have to

**simulate, as we did in DSC 10**. That's what we'll do in this example, too.Simulations provide us with

**empirical distributions of test statistics**; if we simulate with a large (>= 10,000) number of repetitions, the empirical distribution of the test statistic should look similar to the true probability distribution of the test statistic, thanks to the**law of large numbers**.

### Generating the null distribution¶

First, let's figure out how to perform one instance of the experiment – that is, how to flip 100 coins once. Recall, to sample from a categorical distribution, we use `np.random.multinomial`

.

```
# Flipping a fair coin 100 times.
# Interpret the result as [Heads, Tails].
np.random.multinomial(100, [0.5, 0.5])
```

array([57, 43])

Then, we can repeat it a large number of times.

```
# 100,000 times, we want to flip a coin 100 times.
results = []
for _ in range(100_000):
num_heads = np.random.multinomial(100, [0.5, 0.5])[0]
results.append(num_heads)
```

Each entry in `results`

is the number of heads in 100 simulated coin flips.

```
results[:10]
```

[46, 48, 57, 55, 50, 50, 58, 52, 48, 51]

### Visualizing the empirical distribution of the test statistic¶

```
import plotly.express as px
fig = px.histogram(pd.DataFrame(results, columns=['# Heads']), x='# Heads', nbins=50, histnorm='probability',
title='Empirical Distribution of # Heads in 100 Flips of a Fair Coin')
fig.update_layout(xaxis_range=[0, 100])
```