# Lecture 11 – Permutation Testing, Missingness Mechanisms¶

## DSC 80, Winter 2023¶

### Announcements¶

• Lab 3's reflection form is due for extra credit tomorrow at 11:59PM.
• Lab 4 (hypothesis and permutation testing) is due on Monday, February 6th at 11:59PM.
• Project 2 is due on Thursday, February 9th at 11:59PM.
• See this post on Ed for help with Question 7 if you'd like to finish the project before the weekend. (We'll cover the relevant lecture material on Monday.)
• Several assignment grades have been released; check Gradescope and Ed for details.
• Many students' Project 1 grades increased last night 👀.

### Agenda¶

• Using permutation testing to compare two categorical distributions.
• Missingness mechanisms.
• In what ways can data be missing? Why do we care?
• How do we identify missingness mechanisms using data?

## Differences between categorical distributions¶

### Hypothesis testing vs. permutation testing¶

"Standard" hypothesis testing helps us answer questions of the form:

I have a population distribution, and I have one sample. Does this sample look like it was drawn from the population?

Permutation testing helps us answer questions of the form:

I have two samples, but no information about any population distributions. Do these samples look like they were drawn from the same population?

### Example: Married vs. unmarried couples¶

Let's load in a cleaned version of the couples dataset from the last lecture.

### Understanding employment status in households¶

• Do married households more often have a stay-at-home spouse?
• Do households with unmarried couples more often have someone looking for work?
• How much does the employment status of the different households vary?

To answer these questions, let's compute the distribution of employment status conditional on household type (married vs. unmarried).

### Differences in the distributions¶

Are the distributions of employment status for married people and for unmarried people who live with their partners different?

Is this difference just due to noise?