Lecture 11 – Permutation Testing, Missingness Mechanisms

DSC 80, Spring 2023

Agenda

Additional resources:

Differences between categorical distributions

Hypothesis testing vs. permutation testing

"Standard" hypothesis testing helps us answer questions of the form:

I have a population distribution, and I have one sample. Does this sample look like it was drawn from the population?

Permutation testing helps us answer questions of the form:

I have two samples, but no information about any population distributions. Do these samples look like they were drawn from the same population?

Example: Married vs. unmarried couples

Let's load in a cleaned version of the couples dataset from the last lecture.

Understanding employment status in households

To answer these questions, let's compute the distribution of employment status conditional on household type (married vs. unmarried).

Differences in the distributions

Are the distributions of employment status for married people and for unmarried people who live with their partners different?

Is this difference just due to noise?