In [1]:

```
import pandas as pd
import numpy as np
import os
import seaborn as sns
import plotly.express as px
pd.options.plotting.backend = 'plotly'
from ipywidgets import interact
```

We'll look at many examples, and cover the necessary theory along the way.

- Coin flipping
- Total variation distance.
- Penguin bill lengths 🐧.

"Standard" hypothesis testing helps us answer questions of the form:

I have a population distribution, and I have one sample. Does this sample look like it was drawn from the population?

- Sample: 59 heads and 41 tails. Population: A fair coin.

- Sample: Ethnic distribution of UCSD. Population: Ethnic distribution of California. (Comparing categorical distributions with the TVD.)

- Sample: Sample of Torgersen Island penguins. Population: All 333 penguins.

Let's recap the example we saw last time.

**Observation**: We flipped a coin 100 times, and saw 59 heads and 41 tails.

**Null Hypothesis**: The coin is fair.

**Alternative Hypothesis**: The coin is biased in favor of heads.

**Test Statistic**: Number of heads, $N_H$.

- Now that we've chosen a test statistic, we need to generate the distribution of the test statistic under the assumption the null hypothesis is true, i.e. the
**null distribution**.

- This distribution will give us, for instance:
- The probability of seeing 4 heads in 100 flips of a fair coin.
- The probability of seeing at most 46 heads in 100 flips of a fair coin.
**The probability of seeing at least 59 heads in 100 flips of a fair coin.**

The number of heads in 100 flips of a fair coin follows the $\text{Binomial(100, 0.5)}$ distribution, in which

$$P(\text{# heads} = k) = {100 \choose k} (0.5)^k{(1-0.5)^{100-k}} = {100 \choose k} 0.5^{100}$$In [2]:

```
from scipy.special import comb
def p_k_heads(k):
return comb(100, k) * (0.5) ** 100
```

The probability that we see at least 59 heads is then:

In [3]:

```
sum([p_k_heads(k) for k in range(59, 101)])
```

Out[3]:

0.04431304005703377

Let's look at this distribution visually.

In [4]:

```
plot_df = pd.DataFrame().assign(k = range(101))
plot_df['p_k'] = p_k_heads(plot_df['k'])
plot_df['color'] = plot_df['k'].apply(lambda k: 'orange' if k >= 59 else 'blue')
fig = plot_df.plot(kind='bar', x='k', y='p_k', color='color', width=1000)
fig.add_annotation(text='This red area is called the p-value!', x=77, y=0.008, showarrow=False)
```